Rank | Count | Beginning |
---|---|---|
244591 | 19165 | V |
83027 | 9555 | Leta |
156914 | 7333 | Po |
109362 | 6686 | Na |
98906 | 3953 | Med |
277141 | 3538 | Za |
230795 | 3212 | To |
71653 | 2829 | Ko |
181583 | 2813 | Pri |
119120 | 2562 | Naselje |
142766 | 2511 | Od |
25927 | 2333 | Demografija |
220141 | 2213 | Ta |
198377 | 2189 | S |
282133 | 2174 | Zaradi |
290996 | 2097 | Zgodovina |
221160 | 2052 | Tako |
277115 | 1976 | Z |
66721 | 1670 | Ker |
137562 | 1613 | Ob |
162418 | 1532 | Poleg |
76748 | 1507 | Kót |
237272 | 1463 | Tudi |
59599 | 1397 | Je |
16146 | 1356 | Če |
294150 | 1224 | Življenje |
29829 | 1171 | Do |
69134 | 1103 | Kljub |
11239 | 1102 | Bil |
54846 | 996 | Iz |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV